One of the common questions that come up is how to create a new model. Maybe you found a gguf file on huggingface, or you converted a raw modelweights file, or you want to save your system prompt to something you and your colleagues can run later on. There are a lot of reasons why you might want to create a new model.
Now I am sure there are some folks out there that will say that’s not a model…the model is what you find on hugging face. And that’s where Ollama is special. It recognizes that the model weights file that you download from HuggingFace is useless without the template or the system prompt. So all of those should be combined to make up a complete model. And that’s what we are going to do here.
Some others are going to say that I have done this video before. But I am going to try something new here. Staying on point. This is just about creating a new model and the corresponding modelfile. Tangents, that I often like to include, will go in their own …ummm…singles that I will post every now and then.
OK, so you found a gguf file. Here I have one I grabbed from this repo. MaziyarPanahi/BioMistral-7B-GGUF. I downloaded a 4bit quantization and its in this directory. I’ll rename it to a snake case file to make it a little easier to work with.
Now create a modelfile. I’ll add a single directive here. FROM and the relative path to the file. FROM is in all caps, but that’s just to make it easier to read, the case doesn’t actually matter.
Now I can exit out and try ollama create biomistral
. Note that often you will see a -f argument and the path to the modelfile. If you call the modelfile just modelfile and you are in that directory, you don’t have to include -f. And then ollama run biomistral
. This is a model trained on medical data, so lets ask a medical question. what is a tumor? And we get an answer that is not very satisfactory. If we created a model based on another model already in the Ollama library, it would inherit the parent’s template. But this is from the model weights we downloaded.
So we need to set the template. But how do we figure that out. The readme is the first place to look. Hopefully there is some indication of what the template should be. And here in the readme is some sample code. In our Modelfile, add a TEMPLATE directive followed by two pairs of 3 double quotes. Copy the template from the code sample and paste it into the model file. Now replace the system_message with .System in double curly braces and replace prompt with .prompt, again with the double curly braces instead of one.
We can create the model again with ollama create biomistral
and try running it and asking again ‘what is a tumor’. And that is a better answer.
Now figuring out the template and system prompt is a bit of an art rather than a science. We got an answer but for some reason when I got that I felt that something was wrong. When the person who created the GGUF file is not in the team that created the model weights, you sometimes have to be a bit suspicious. And I feel like I have seen that template with other non-mistral related models. Since this model is based on mistral, I looked up the mistral model on hugging face at mistralai/Mistral-7B-Instruct-v0.2. And the template seems to be very different.
Lets take a look at the repo for the biomistral model that the original team created. Looking through there, I couldn’t find the template. So lets go to files and then click on ‘tokenizer config.json’. This file often has really useful info. At the top we see a few special tokens. Then below is a chat template. This is a little harder to read. but it starts with ‘bos token’, which just above is defined as an ‘s’ in angle brackets. And then it says instructions need to be given as user then assistant then user then assistant, etc. And there is no system prompt recognized by the model. If the role is user, then have ‘INST’ enclosed in square brackets then the message content, then ‘/INST’ in square brackets. And then any assistant messages. then for another user prompt, include the inst tokens again and keep repeating. After the whole thing include the eos token which is defined below this as ‘/s’ in angle brackets. That is a lot different than the template which seemed to be boilerplate from ‘The Bloke’.
So lets create the model again with ollama create biomistral
and try our question again. This looks good. In testing before creating this video, I tested a bunch of times and generally saw much better answers with the new template.
Now you can start to play around with the parameters. Temperature is a good one to try. Add PARAMETER temperature 0.2. Create the model and try the what is a tumor question. Repeat with temperature set to 1.8. Do you see any difference? Ideally you would try each a few times because you are usually not going to get the same answer every time. And what’s the range on temperature? Depends on which docs you read. I have seen its 0 to 1, or 0 to 2, 0 to 5, or even any non-zero number. Sometimes you get some interesting answers at 3 or 5 or higher.
Sometimes you will see strange tokens appear in the output, so try adding those as stop tokens. You can find the full docs for the modelfile here. But remember…this stuff is definitely closer to an art than a science. Often the Ollama team publishes a model using what the weights creation team suggests and it turns out to be not that effective, and its comments from the community that help tweak the configuration to something that works a lot better. If you run 'ollama pull ’ and the model name a few days after the release, you might get an updated model.
Once you have the model done you can push it to the ollama library. You can learn more about doing that here. Lots of folks have published different models, and you can see a few folks have published this same model. Or just download my version at m/biomistral.
Have you created any interesting models? Share them in the comments below.
Thanks so much for being here. Goodbye.